`DataFrame` Indexing and Loading
Introduction to CSV Files
- CSV (Comma Separated Values): A lightweight, ubiquitous format for data storage.
- Commonly used by spreadsheet software like Excel and Google Sheets.
- Format is flexible but lacks a strict specification, leading to possible variations in structure.
Viewing CSV File Contents
- Viewing CSV Files with Shell Commands:
- Jupyter notebooks can integrate shell commands using an exclamation mark (
!). - Example: Using
!catto display the contents of a CSV file:!cat datasets/Admission_Predict.csv
- Jupyter notebooks can integrate shell commands using an exclamation mark (
Loading CSV Data into a DataFrame
-
Import pandas:
import pandas as pd -
Loading Data:
df = pd.read_csv('datasets/Admission_Predict.csv') -
Viewing Data:
df.head()
Setting a Specific Column as the Index
- Using
index_colparameter:df = pd.read_csv('datasets/Admission_Predict.csv', index_col=0)
df.head()
Renaming Columns
-
Basic Renaming Using a Dictionary:
new_df = df.rename(columns={
'GRE Score': 'GRE Score',
'TOEFL Score': 'TOEFL Score',
'University Rating': 'University Rating',
'SOP': 'Statement of Purpose',
'LOR': 'Letter of Recommendation',
'CGPA': 'CGPA',
'Research': 'Research',
'Chance of Admit': 'Chance of Admit'
})
new_df.head() -
Identifying Issues with Column Names:
new_df.columns -
Addressing Issues with Spaces:
new_df = new_df.rename(columns={'LOR ': 'Letter of Recommendation'})
new_df.head() -
Using
str.strip()for Robustness:new_df = new_df.rename(mapper=str.strip, axis='columns')
new_df.head()
Modifying Column Names Directly
- Using
df.columnsAttribute:cols = list(df.columns)
cols = [x.lower().strip() for x in cols]
df.columns = cols
df.head()
Summary
- Loading CSV into pandas DataFrame: Utilized
pd.read_csv(). - Basic Data Cleaning: Performed column renaming and handled issues with extra spaces.
- Direct Column Modification: Demonstrated using lists and list comprehensions for efficient renaming.